Histograms with ggplot2

Let's go over how to create histograms with ggplot2. Refer to the video for the full explanation! Also a quick note, we are going to be showing a lot of what ggplot can do, but not what you should do!

Load Data

We'll use the movie dataset that comes with ggplot:

In [17]:
library(ggplot2)
df <- movies <- movies[sample(nrow(movies), 1000), ]
In [18]:
head(df)
Out[18]:
titleyearlengthbudgetratingvotesr1r2r3r4r5r6r7r8r9r10mpaaActionAnimationComedyDramaDocumentaryRomanceShort
21569Gunan il guerriero198381NA23844.514.54.514.514.504.54.54.54.51000000
39820Phantom Brother198892NA21045.514.500024.514.50014.50000000
47662Snow199681NA7.2100014.50044.514.524.5024.50001000
41080Prelude20006NA6.3184.50014.54.5024.524.54.524.50000001
58741Zwaarmoedige verhalen voor bij de centrale verwarming197595NA52714.5004.514.514.514.514.514.524.50011000
47430Sleepy-Time Tom19517NA6.73200014.54.54.524.514.524.54.50110001

Using qplot()

Basics

In [21]:
qplot(rating,data=df,geom='histogram',binwidth=0.1,alpha=0.8)

Using ggplot()

Let's see how we can really expand on this by using ggplot! They syntax starts off with the base plot:

In [23]:
# ggplot(data, aesthetics)
pl <- ggplot(df,aes(x=rating))
In [24]:
# Add Histogram Geometry
pl + geom_histogram()
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Adding Color

In [41]:
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(binwidth=0.1,color='red',fill='pink')

Adding Labels

In [65]:
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(binwidth=0.1,color='red',fill='pink') + xlab('Movie Ratings')+ ylab('Occurences') + ggtitle(' Movie Ratings')

Change Alpha (Transparency)

In [49]:
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(binwidth=0.1,fill='blue',alpha=0.4) + xlab('Movie Ratings')+ ylab('Occurences')

Linetypes

We have the options: "blank", "solid", "dashed", "dotted", "dotdash", "longdash", and "twodash". You would never really use these with a histogram, but just to show your options:

In [52]:
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(binwidth=0.1,color='blue',fill='pink',linetype='dotted') + xlab('Movie Ratings')+ ylab('Occurences')

Advanced Aesthetics

We can add a aes() argument to the geom_histogram for some more advanced features. We won't go too deep into these, but ggplot gives you the ability to edit color and fill scales.

In [57]:
# Adding Labels
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(binwidth=0.1,aes(fill=..count..)) + xlab('Movie Ratings')+ ylab('Occurences')

You can further edit this by adding the scale_fill_gradient() function to your ggplot objects:

In [63]:
# Adding Labels
pl <- ggplot(df,aes(x=rating))
pl2 <- pl + geom_histogram(binwidth=0.1,aes(fill=..count..)) + xlab('Movie Ratings')+ ylab('Occurences')
In [59]:
# scale_fill_gradient('Label',low=color1,high=color2)
pl2 + scale_fill_gradient('Count',low='blue',high='red')
In [62]:
# scale_fill_gradient('Label',low=color1,high=color2)
pl2 + scale_fill_gradient('Count',low='darkgreen',high='lightblue')

Adding density plot

You can add a kernel density estimation plot

In [68]:
# Adding Labels
pl <- ggplot(df,aes(x=rating))
pl + geom_histogram(aes(y=..density..)) + geom_density(color='red')
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Alright! That's all for now concerning histograms. We've shown that ggplot has amazing customization capabilities, however it definitely takes time to get used to!